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Abstract — Based on prior work by Eckford, it is shown liow 
expectation maximization (EM) may be viewed, and used, as a 
message passing algorithm in factor graphs. 

I. Introduction 

Graphical models [1] such as factor graphs [2] are tools both 
for system modeling and for the development of algorithms for 
detection and estimation, cf. [3], [4]. In addition to the basic 
sum-product and max-product (or min-sum) algorithms, which 
dominate coding applications, signal processing techniques 
including LMMSE/Kalman filtering, gradient algorithms, and 
particle filters can be naturally viewed and used as message 
passing in factor graphs [3], [4]. 

Expectation maximization (EM) [5] [6] has also become 
a standard technique for parameter estimation in graphical 
models [7] [8]. In particular, Eckford showed how EM can 
be viewed, and used, as a technique for breaking cycles in 
factor graphs [9], [10]. However, it is not obvious if and how 
EM can be described as a message passing algorithm with 
local message update rules. 

In the present paper, we develop EM as a message passing 
technique. The standard "global" view of EM is thus replaced 
by a "local" message passing view with a new (local) message 
computation rule for continuous variables. The new message 
computation rule can often be used in cases where the standard 
sum-product (integral-product) rule yields impractical expres- 
sions for the messages. 

II. Review of EM Algorithm 

We begin by reviewing the expectation maximization (EM) 
algorithm in a setting which is suitable for the purpose of this 
paper Suppose we wish to find 



argmax/(0). 



(1) 



We assume that f{d) is the "marginal" of some real-valued 
function f{x,9): 



f{9)^ / f{x,e), 



(2) 



where g{x) denotes either integration or summation of g{x) 
over the whole range of x. The function f{x, 6) is assumed 
to be nonnegative: 



f{x, e)>0 for all X and all 



(3) 



We will also assume that the integral (or the sum) 
J^f{x,e)logf{x,0') exists for all 9, 9'. The EM algorithm 
attempts to compute (1) as follows: 
1) Make some initial guess 



e 



/a 



\H9) 

/b 



X 

Fig. 1. Factor graph corresponding to (7). 

2) Expectation step: evaluate 

/ /(x,^W)log/(a:,0). 



3) Maximization step: compute 

e^'^+i) = argmax/('=)(6'). 



(4) 



(5) 



4) Repeat 2-3 until convergence or until the available time 
is over 

The main property of the EM algorithm is 

/(0~('=+i))>/(^(fe)). (6) 

For completeness, a proof of (6) is given in the appendix. 

III. Message Passing Interpretation 

We now rewrite the EM algorithm in message passing form. 
In this section, we will assume a trivial factorization 



fix,9) = fAi9)fB{x,9), 



(7) 



where /a(^) rnay be viewed as encoding the a priori infor- 
mation about Q. More interesting factorizations (i.e., models 
with internal structure) will be considered in the next section. 

We will use Forney-style factor graphs as in [3], where 
edges represent variables and nodes represent factors. As in 
[3], we will use capital letters for model variables and small 
letters for values of such variables. The factor graph of (7) is 
shown in Fig. 1. In this setup, the EM algorithm amounts to 
iterative recomputation of the following messages: 

Upwards message h{9): 



h{9) 



/^/B(x,gW)log/B(a:,0) 
Ep3[log/B(X,0)], 



(8) 
(9) 



where Ep^ denotes the expectation with respect to 
the probability distribution 



Pb 



(a;|^W) 



.fB{x,0^'^'>) 



(10) 



/,,/b(x',0W) 

Downwards message 

Q(k+i) ^ argmax(log/A(6') + h{0)) (11) 
e 

= argmax (/a(^) ■ e''^^^) . (12) 

The equivalence of this message passing algorithm with (4) 
and (5) may be seen as follows. From (4) and (5), we have 

^C^+i) = argmax / f{x, O^'^'i ) log f{x, 6) (13) 
e Jx 

= argmax / /a(^('=))/b(^, log(/A(^)/B(x, 0)) 
e Jx 

(14) 

= argmax J /B(a;, ( log/A(0) + log/B(x, 

(15) 

/^/B(x,^W)log/B(x,0)\ 



argmax log/A(^) + 



/,,/b(x',0W) 



(16) 



which is equivalent to (8) and (11). 
Some remarks: 

1) The computation (8) or (9) is not an instance of the 
sum-product algorithm. 

2) The message h{d) may be viewed as a "log-domain" 
summary of Jb- In (12), the corresponding "probability 
domain" summary e^^^^ is consistent with the factor 
graph interpretation. 

3) A constant may be added to (or subtracted from) h{0) 
without affecting (11). 

4) If fA{0) is a constant, the normalization in (8) can be 
omitted. More generally, the normalization in (8) can be 
omitted if /a(^) is constant for all 6 such that /a(^) 

0. However, in contrast to most standard accounts of the 
EM algorithm, we explicitly wish to allow more general 
functions /a. 

5) Nothing changes if we introduce a known observation 
(i.e., a constant argument) y into / such that (7) becomes 

f{x,y,e) = fAiy,e)fB{x,y,e). 

IV. NONTRiviAL Factor Graphs 
The algorithm of the previous section still applies if both 
= (6i, . . . , Qn)^ and X = {Xq, . . . , are vectors. 
However, opportunities to simplify the computations may arise 
if /a and /b have "nice" factorizations. For example, assume 
that /b factors as 

fB{x,y,0) = /o(a^o)/l(a;o,a;i,?/i,6'i) • ■ ■ fn{Xn-l,Xn,yn,0n), 

(17) 

where y = (yi, . . . ,yn)'^ is some known (observed) vector. 
Such factorizations arise from classical trelhs models and state 



space models. The factor graph corresponding to (17) is shown 
in Fig. 2. 

The upwards message h{9) (9) splits into a sum with one 
term for each node in the factor graph: 

h{e) =E log(/o(a;o)/i(a;o,a;i,?/i,^i)--- 

■■■fn{x 0n))] (18) 

= E[log/o(Xo)] +E[log/i(Xo,Xi,yi,0i)] + .. . 

... + E[log/„(X„_i,X„,y„,0„)] (19) 

Each term 

hkiOk) = E[logfk{Xk-uXk,yk,ek)] (20) 

may be viewed as the message out of the corresponding node, 

as indicated in Fig. 2. The constant term E[log /o(Xo)] in (19) 
may be omitted (cf. Remark 3 in Section III). As in (9), all 
expectations are with respect to the probabiUty distribution pb, 
which we here denote by pB{x\y, 0). Note that each term (20) 
requires only pB{xk-i, Xk\y, 0), the joint distribution of X].-i 
and Xk'. 

hk{Ok)= I / pB{xk-i,Xk\y,O)\ogfk{xk-i,Xk,yk,0k)- 

(21) 

These joint distributions may be obtained by means of the 
standard sum-product algorithm (belief propagation) [2] [3]: 
from elementary factor graph theory, we have 

PB{xk-i,Xk\y,0) oc fk{xk-i,xk,yk,0) I^Xk-i^fk(.Xk-i) 

■^Xk^h{xk), (22) 

where nx^-i^fk and nx^^fk are the messages of the sum- 
product algorithm towards the node fk and where "oc" denotes 
equality up to a scale factor that does not depend on Xk-i,Xk. 
It follows that 

PB{xk-i,Xk\y, 0) = 

h {-rk-i..rk,!Jk- 0) (.rA._i) /(A-,-/, (.i:a:) 

/xfc fk{Xk-l,Xk, y, 0) IJ-Xk-i^h i^k-l) fJ-Xk^fk (xk) 

(23) 

Note that, if the sum product messages fiXk-i^fk t^Xk^fk 
are computed without any scaling, then the denominator in 
(23) equals pB{y\0), which is independent of k. 
The downwards message (11) is 

{ei,...,0nf = argmax ( log fA{0) + hi{0i) + . . . 
ei,...,en 

... + K{6r,)) (24) 
= argmax (/a (61) • e''^^^^) ■ • • e''''^^")) . (25) 

01,. ..fir, ^ ' 

If /a has itself a nice factorization, then (24) or (25) may be 
computed by the standard max-sum or max-product algorithm, 
respectively. This apphes, in particular, for the standard case 
©1 = ©2 = • • • = ©ra> which is illustrated in Fig. 3. 

The above derivations do not in any essential way depend on 
the specific example (17). In principle, any cut-set of edges in 
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Fig. 2. Factor graph corresponding to (17). 
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Fig. 3. Factor graph of Gi = ©2 = . • . = ©n- 




Fig. 4. /i-message out of a generic node. 



some factor graph may be chosen to be the vector <d. However, 
the resulting subgraphs corresponding to /a and /b should 
be cycle-free in order to permit the computation of exact 
expectations (/i-messages) and maximizations (^-messages). 
The /i-message out of a generic node g{zi, . . . , Zm,9k) (cf. 
Fig. 4) is 



g{zi, . . . ,Z.m,Ok) fJ.{zi) ■ ■ -fliZm) 

■^ogg{zi,...,Zm,dk) (26) 



with 



7=/ •••/ g{Zl,...,Zm.,Ok)lJ'{zi)---IJ-{Zm) (27) 

and where fi{zi) , . . . , fi{zm) are the standard sum-product 
messages. Obviously, this message passing rule may also be 
applied to a (sub-) graph with cycles, but then there is no 
guarantee for (6). 

V. Conclusion 

Elaborating on prior work by Eckford, we have formulated 
EM in message passing form with a new message computation 
rule (26). In this setting, a main attraction of EM is that this 
message passing rule can be evaluated in some cases where the 
standard sum-product or max-product rules yield intractable 



expressions. It is likely that "local" use of this message 
computation rule can give good results even in situations 
where the "global" conditions required to guarantee (6) are 
not satisfied. 

Appendix: Proof of (6) 

The proof is standard (cf. [6]), but adapted to the slightly 
nonstandard setup of Section II. 

Lemma: The function 

Re, e') ^ m + £ fix, 6') log (28) 

f{e,0')<m (29) 
f{9,0) = f{e). (30) 

□ 



satisfies both 



and 



Proof: The equaUty (30) is obvious. The inequality 
(29) follows from eliminating the logarithm in (28) by the 
inequality log a;<a; — lfora;>0: 

Mo')</(«')+//(.,o')(f;^-i) (31) 

= /(«')+ / /(»,»)- / f{x,«') (32) 

= f{0). (33) 

■ 

To prove (6), we first note that (5) is equivalent to 

^(fe+i) = argmax/(6l, 6^''^). (34) 
e 

We then obtain 

/(^W) = /(^W,^W) (35) 
< (36) 

< f{0^'+'^), (37) 

where (35) follows from (30), (36) follows from (34), and (37) 
follows from (29). 
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